Optimizing Pipelined Execution for Distributed In-Memory OLAP System
نویسندگان
چکیده
In the coming big data era, the demand for data analysis capability in real applications is growing at amazing pace. The memory’s increasing capacity and decreasing price make it possible and attractive for the distributed OLAP system to load all the data into memory and thus significantly improve the data processing performance. In this paper, we model the performance of pipelined execution in distributed in-memory OLAP system and figure out that the data communication among the computation nodes, which is achieved by data exchange operator, is the performance bottleneck. Consequently, we explore the pipelined data exchange in depth and give a novel solution that is efficient, scalable, and skew-resilient. Experimental results show the effectiveness of our proposals by comparing with state-of-art techniques.
منابع مشابه
Spatial Software Pipelining on Distributed Architectures for Sparse Matrix Codes
Wire delays and communication time are forcing processors to become decentralized modules communicating through a fast, scalable interconnect. For scalability, every portion of the processor must be decentralized, including the memory system. Compilers that can take a sequential program as input and parallelize it (including the memory) across the new processors are necessary. Much research has...
متن کاملA Parallel Pipelined Hough Transform
The algorithms based on Hough Transform techniques to detect complex shapes, like circles and ellipses, require excessive computing time. In order to obtain better execution times we propose a new procedure to parallelize the detection process in a distributed memory multiprocessor. The sequential algorithm splits the detection of parameters into several stages and uses a focusing algorithm to ...
متن کاملData Cloud for Distributed Data Mining via Pipelined MapReduce
Distributed data mining (DDM) which often utilizes autonomous agents is a process to extract globally interesting associations, classifiers, clusters, and other patterns from distributed data. As datasets double in size every year, moving the data repeatedly to distant CPUs brings about high communication cost. In this paper, data cloud is utilized to implement DDM in order to move the data rat...
متن کاملStatic scheduling of pipelined periodic tasks in distributed real-time systems
Many distributed real-time applications involve periodic activities with end-to-end timing constraints that are larger than the periods. That is, a new instance of a periodic activity will come into existence before the previous instance has been completed. Also, such activities typically involve communicating modules in a distributed system where some modules may be replicated for resilience. ...
متن کاملRelaxed Operator Fusion for In-Memory Databases: Making Compilation, Vectorization, and Prefetching Work Together At Last
In-memory database management systems (DBMSs) are a key component of modern on-line analytic processing (OLAP) applications, since they provide low-latency access to large volumes of data. Because disk accesses are no longer the principle bottleneck in such systems, the focus in designing query execution engines has shifted to optimizing CPU performance. Recent systems have revived an older tec...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014